Continuous Cloud-Scale Query Optimization and Processing

نویسندگان

  • Nicolas Bruno
  • Sapna Jain
  • Jingren Zhou
چکیده

Massive data analysis in cloud-scale data centers plays a crucial role in making critical business decisions. Highlevel scripting languages free developers from understanding various system trade-offs, but introduce new challenges for query optimization. One key optimization challenge is missing accurate data statistics, typically due to massive data volumes and their distributed nature, complex computation logic, and frequent usage of user-defined functions. In this paper we propose novel techniques to adapt query processing in the Scope system, the cloud-scale computation environment in Microsoft Online Services. We continuously monitor query execution, collect actual runtime statistics, and adapt parallel execution plans as the query executes. We discuss similarities and differences between our approach and alternatives proposed in the context of traditional centralized systems. Experiments on large-scale Scope production clusters show that the proposed techniques systematically solve the challenge of missing/inaccurate data statistics, detect and resolve partition skew and plan structure, and improve query latency by a few folds for real workloads. Although we focus on optimizing high-level languages, the same ideas are also applicable for MapReduce systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-time query processing optimization for cloud-based wireless body area networks

Wireless body area networks (WBANs) have received a lot of attention from both academia and industry due to the increasing need of ubiquitous computing for eHealth applications, the continuous advances in miniaturization of electronic devices, and the ultra-low-power wireless technologies. In these networks, various sensors are attached either on clothes, on human body or even implanted under t...

متن کامل

Cost-Aware Query Optimization during Cloud-Based Complex Event Processing

Complex Event Processing describes the problem of timely and continuous processing of event streams. The load of Complex Event Processing systems can vary (e.g., event rates). Static resource provision leads to higher monetary costs because enough resources have to be provided to efficiently handle peak loads. Therefore, most of the time the resources will not be fully utilized. One way to achi...

متن کامل

Recurring Job Optimization for Massively Distributed Query Processing

Companies providing cloud-scale data services have increasing needs to store and analyze massive data sets. For cost and performance reasons, processing is typically done on large clusters of tens of thousands of commodity machines. Developers use high-level scripting languages that simplify understanding various system trade-offs, but introduce new challenges for query optimization. One key op...

متن کامل

Scalable RDF Graph Querying Using Cloud Computing

With the explosion of the semantic web technologies, conventional SPARQL processing tools do not scale well for large amounts of RDF data because they are designed for use on a single-machine context. Several optimization solutions combined with cloud computing technologies have been proposed to overcome these drawbacks. However, these approaches only consider the SPARQL Basic Graph Pattern pro...

متن کامل

Multi-Join Query Optimization for Read-Optimized Data Warehouse in a Cloud Environment

Read-Optimized databases are well suited for read intensive Data Warehouse applications. In addition, data in these applications grow rapidly and hence need a dynamically scalable environment like Cloud. Cloud provides a flexible environment where user can load data, execute queries and scale resources on demand. As the resources are scaled up, the number of nodes involved in the execution of q...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2013